-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evolution of the Alpaka "gpu framework" #39428
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39428/32137
|
A new Pull Request was created by @makortel (Matti Kortelainen) for master. It involves the following packages:
@cmsbuild, @smuzaffar, @Dr15Jones, @makortel, @fwyzard can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
enable gpu |
@cmsbuild, please test |
@@ -39,4 +40,20 @@ namespace traits { | |||
|
|||
} // namespace traits | |||
|
|||
namespace cms::alpakatools { | |||
// TODO: Is this the right place for the specialization? Or should it be in PortableDeviceProduct? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fwyzard Any thoughts on the placement of this template specialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you comment in HeterogeneousCore/AlpakaCore/interface/alpaka/ProducerBase.h
,
a specialization of cms::alpakatools::TransferToHost template [...] should be provided in the same file where T is defined.
So I think it makes sense to have the specialisation for ALPAKA_ACCELERATOR_NAMESPACE::PortableCollection<T>
together with the declaration of ALPAKA_ACCELERATOR_NAMESPACE::PortableCollection<T>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the alternative would be to specialize the TransferToHost
for PortableDeviceCollection<T, TDev>
, which for all practical purposes should be sufficient. That would also avoid the use of ALPAKA_ACCELERATOR_NAMESPACE
in the specialization definition. (and I'm still uncertain which one would be better)
*/ | ||
|
||
#ifdef ALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED | ||
// All backends with a synchronous queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fwyzard I think all other uses of these #ifdef
s to check if the backend's device is host, and/or if the queue is blocking, could be easily replaced with e.g. if constexpr
and traits, except this one. I think it would be technically possible here as well, but I think it would be more cumbersome as the classes are quite different for the two case.
@@ -0,0 +1,52 @@ | |||
#ifndef HeterogeneousCore_AlpakaCore_interface_alpaka_ESDeviceProduct_h | |||
#define HeterogeneousCore_AlpakaCore_interface_alpaka_ESDeviceProduct_h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is one of those which should not really be in HeterogeneousCore/AlpakaCore
because of the package depending on FWCore/Framework
, it depends only on Alpaka itself, but is specific to CMS (so would not really fit in HeterogeneousCore/AlpakaInterface
).
// TODO: this place is suboptimal given that the framework's | ||
// typelookup.h is in FWCore/Utilities (i.e. independent of the | ||
// framework itself). The ESDeviceProduct should be in the same | ||
// package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is one of those which should not really be in HeterogeneousCore/AlpakaCore
because of the package depending on FWCore/Framework
. The code here depends only on Alpaka and FWCore/Utilities
, and is specific to CMS (so would not really fit in HeterogeneousCore/AlpakaInterface
).
// process.options.accelerators to AlpakaServices via | ||
// ProcessAcceleratorAlpaka. | ||
edm::Service<ALPAKA_ACCELERATOR_NAMESPACE::AlpakaService> alpakaService; | ||
if (not alpakaService->enabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to duplicate the chooseDevice()
from cms::alpakatools::chooseDevice()
in order to avoid the templating, add easily the dependence on the corresponding AlpakaService
(following the discussion in #39319 (comment)).
#ifndef HeterogeneousCore_AlpakaInterface_interface_TransferToHost_h | ||
#define HeterogeneousCore_AlpakaInterface_interface_TransferToHost_h | ||
|
||
// TODO: better package? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is one of those that should not be in a package depending on FWCore/Framework
(but in one that fulfills the DataFormat constraints), but is specific to CMS so would not really fit in HeterogeneousCore/AlpakaInterface
.
-1 Failed Tests: HeaderConsistency UnitTests Unit TestsI found errors in the following unit tests: ---> test test-das-selected-lumis had ERRORS GPU Comparison SummarySummary:
Comparison SummarySummary:
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39428/33125
|
Pull request #39428 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @makortel, @fwyzard can you please check and sign again. |
@cmsbuild, please test Mostly for the record, the diff of the PR is the same as before. |
+heterogeneous (taking the liberty since there are no code changes since @fwyzard signing) |
+core |
This pull request is fully signed and it will be integrated in one of the next master IBs after it passes the integration tests. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2aec6d/29196/summary.html Comparison SummarySummary:
GPU Comparison SummarySummary:
|
+1 |
PR description:
This PR proposes an evolution of the "gpu framework" for Alpaka EDModules and ESModules. The current model (introduced in #38855) is similar to the model we have for CUDA EDModules, whereas this development builds on earlier prototypes (in cms-patatrack/pixeltrack-standalone#224, cms-patatrack/pixeltrack-standalone#256, cms-patatrack/pixeltrack-standalone#257 for CUDA, and in cms-patatrack/pixeltrack-standalone#314 for Alpaka). The new model was presented in the HLT GPU developments meeting in https://indico.cern.ch/event/1192252/#4-new-cmssw-accelerator-framew .
The two models can coexist. The EDModules/ESModules of the current model can not access the device data products of the new model. I believe (did not test) the new model EDModules would be able to use the ESProducts of the current model, and if really necessary, they could be made to use the EDProducts of the current model (but I'd prefer to avoid that).
Shortcomings of the CUDA model that this PR addresses are
ScopedContext*
destructor does not check for errors in CUDA API calls in order to avoid exception being thrown from a destructoredm::ESTransientHandle
. We'd still need to enable the actual memory optimization (Look into enabling edm::ESTransientHandle memory optimization again #31085)The device product wrapper, that is currently
cms::cuda::Product<T>
/cms::alpakatools::Product<TQueue, T>
is now made part of the framework asedm::DeviceProduct<T>
. Making it a framework type allows some future developments that would be difficult without. Currently the backend/memory space specific metadata are included in theedm::DeviceProduct<T>
in a type-erased way, such that only the correct backend code can access the data. This may stay or change in the future.This PR makes the host-synchronous backend(s) to produce the
T
directly instead of wrapping it toedm::DeviceProduct<T>
(which was left as future improvement in #38855).The PR explores three possible models for ESProducts (which are indicated in comments in the code, these were also summarized in the slides linked above)
ALPAKA_ACCELERATOR_NAMESPACE
Device
template argumentPortableCollection
The PR currently has
threetwo limitations compared to the current modelsThe automatic transfers from device to host are synchronous.The automatic transfer from device to host are now asynchronous (with the aid of Added transformAsync ability to modules #39557)This shortcoming will be improved in a future PR after an asynchronous version ofTransformer
module ability has been developed (as planned in Add Transformer ability to modules #38454)In the mean time, the blocking synchronization can be worked around with an explicit "Transcriber" module that uses thestream::SynchronizingEDProducer
(i.e.ExternalWork
module ability) as in Integration of alpaka and macro-based SoA in CMSSW #38855.ExternalWork
to ESProducers (Add ExternalWork-like mechanism for ESProducers #24185)The code has many
TODO:
comments of which most are meant for further evolution after this PR. I'm planning to address the following in this PR#ifdef
for backend-specific behavior withif constexpr
(or equivalent)FWCore/Utilities
and are specific to CMS are currently placed inHeterogeneousCore/AlpakaCore
andHeterogeneousCore/AlpakaInterface
. They should be moved out from the former, but the latter seems non-ideal as well. New package?Device
in the names, currently they are alongDeviceEvent
,EDDeviceGetToken
,global::EDProducer
(i.e. noDevice
in name).edm::global::EDProducer
andALPAKA_ACCELERATOR_NAMESPACE::global::EDProducer
(where only theglobal::EDProducer
part is visible in code)DeviceEvent
,DeviceEDGetToken
,global::DeviceEDProducer
device::Event
,device::EDGetToken
,device::global::EDProducer
(all still enclosed inALPAKA_ACCELERATOR_NAMESPACE
)Add(descoped from this PR)README.md
documenting the interfacePR validation:
Added unit test passes on both CPU and GPU.